AITopics | formal verification

Collaborating Authors

formal verification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Formal verification for safety evaluation of autonomous vehicles: an interview with Abdelrahman Sayed Sayed

AIHubApr-21-2026, 13:37:43 GMT

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. We sat down with Abdelrahman Sayed Sayed to chat about his work on formal verification applied to autonomous vehicles. Could you tell us a bit about where you're studying and the broad topic of your research? My PhD topic is formal verification of neural ODE (ordinary differential equations) for safety evaluation in autonomous vehicles. Could you say something about formal verification and why it's such an important topic?

artificial intelligence, machine learning, verification, (13 more...)

AIHub

Country:

Europe > France (0.06)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
Europe > Italy (0.04)
Europe > Croatia > Zagreb County > Zagreb (0.04)

Genre: Personal > Interview (1.00)

Industry:

Education (0.47)
Transportation > Ground > Rail (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.93)

Add feedback

62c6d7893b13a13c659cb815852dd00d-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-15-2026, 10:41:27 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Hong Kong (0.04)
Europe > Italy > Lazio > Rome (0.04)
(2 more...)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Add feedback

62c6d7893b13a13c659cb815852dd00d-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-15-2026, 10:41:24 GMT

large language model, machine learning, programming language, (25 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

BRIDGE: Building Representations In Domain Guided Program Verification

George, Robert Joseph, Eisenach, Carson, Ghai, Udaya, Perrault-Joncas, Dominique, Anandkumar, Anima, Foster, Dean

arXiv.org Artificial IntelligenceDec-4-2025

Large language models (LLMs) have achieved impressive results in code generation, yet struggle with program verification, especially in interactive proof frameworks such as Lean4. A central challenge is scalability: verified synthesis requires not just code, but also precise specifications and correctness proofs, and existing approaches rarely span all three domains. We present BRIDGE, the first systematic study of structured prompting for scalable verified program generation. BRIDGE decomposes verification into three interconnected domains: Code (executable implementations), Specifications (formal intent statements), and Proofs (constructive correctness arguments). Our key idea is to elicit distinct reasoning behaviors functional, specification-driven, and proof-oriented as intermediate representations that preserve semantic structure and connect these domains. Through systematic ablations, we show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods. For example, functional reasoning improves correctness of code in formal languages (Lean4) by nearly 1.5x (pass@5) over direct baselines. In inference-time compute, functional reasoning is also 2x more efficient, achieving higher pass rates with fewer generations and lower total sampling budgets. Similarly, we find that specification-driven prompting boosts Python coding pass rates by up to 17.5%. These findings suggest that structured domain alignment is a promising direction for advancing verified synthesis. BRIDGE establishes a foundation for training via expert iteration or RLVR, enabling models to internalize these reasoning strategies across code, specifications, and proofs.

large language model, machine learning, programming language, (23 more...)

arXiv.org Artificial Intelligence

2511.21104

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

AI for software engineering: from probable to provable

Meyer, Bertrand

arXiv.org Artificial IntelligenceDec-1-2025

Vibe coding, the much-touted use of AI techniques for programming, faces two overwhelming obstacles: the difficulty of specifying goals ("prompt engineering" is a form of requirements engineering, one of the toughest disciplines of software engineering); and the hallucination phenomenon. Programs are only useful if they are correct or very close to correct. The solution? Combine the creativity of artificial intelligence with the rigor of formal specification methods and the power of formal program verification, supported by modern proof tools.

large language model, logic & formal reasoning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.23159

Genre: Research Report (0.64)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.35)

Add feedback

Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification

Ramani, Keshav, Tawosi, Vali, Alamir, Salwa, Borrajo, Daniel

arXiv.org Artificial IntelligenceNov-25-2025

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.

large language model, logic & formal reasoning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.03469

Country: Europe (0.47)

Genre: Research Report > New Finding (0.68)

Industry: Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Towards Formal Verification of LLM-Generated Code from Natural Language Prompts

Councilman, Aaron, Fu, David Jiahao, Gupta, Aryan, Wang, Chengxiao, Grove, David, Wang, Yu-Xiong, Adve, Vikram

arXiv.org Artificial IntelligenceNov-24-2025

In the past few years LLMs have emerged as a tool that can aid programmers by taking natural language descriptions and generating code based on it. However, the reliability of LLM code generation and current validation techniques for it are far from strong enough to be used for mission-critical or safety-critical applications. In this work we explore ways to offer formal guarantees of correctness to LLM generated code; such guarantees could improve the quality of general AI Code Assistants and support their use for critical applications. To address this challenge we propose to incorporate a Formal Query Language that can represent a user's intent in a formally defined but natural language-like manner that a user can confirm matches their intent. We then have a formal specification of the user intent which we can use to verify that LLM-generated code matches the user's intent. We implement these ideas in our system, Astrogator, for the Ansible programming language, widely used for system administration, including for critical systems. The system includes an intuitive formal query language, a calculus for representing the behavior of Ansible programs, and a symbolic interpreter and a unification algorithm which together are used for the verification. A key innovation in Astrogator is the use of a Knowledge Base to capture system-specific implementation dependencies that greatly reduce the need for system knowledge in expressing formal queries. On a benchmark suite of 21 code-generation tasks, our verifier is able to verify correct code in 83% of cases and identify incorrect code in 92%.

large language model, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.1329

Country:

Europe (1.00)
North America > United States > Illinois (0.29)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification

Yubeaton, Patrick, Nakkab, Andre, Xiao, Weihua, Collini, Luca, Karri, Ramesh, Hegde, Chinmay, Garg, Siddharth

arXiv.org Artificial IntelligenceNov-20-2025

This paper introduces VeriThoughts, a novel dataset designed for reasoning-based Verilog code generation. We establish a new benchmark framework grounded in formal verification methods to evaluate the quality and correctness of generated hardware descriptions. Additionally, we present a suite of specialized small-scale models optimized specifically for Verilog generation. Our work addresses the growing need for automated hardware design tools that can produce verifiably correct implementations from high-level specifications, potentially accelerating the hardware development process while maintaining rigorous correctness guarantees. Our code and data are available at \href{https://github.com/wilyub/VeriThoughts}{this URL}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.20302

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)

Add feedback

Position: Vibe Coding Needs Vibe Reasoning: Improving Vibe Coding with Formal Verification

Mitchell, Jacqueline, Shaaban, Yasser

arXiv.org Artificial IntelligenceNov-4-2025

``Vibe coding'' -- the practice of developing software through iteratively conversing with a large language model (LLM) -- has exploded in popularity within the last year. However, developers report key limitations including the accumulation of technical debt, security issues, and code churn to achieve satisfactory results. We argue that these pitfalls result from LLMs' inability to reconcile accumulating human-imposed constraints during vibe coding, with developers inadvertently failing to resolve contradictions because LLMs prioritize user commands over code consistency. Given LLMs' receptiveness to verification-based feedback, we argue that formal methods can mitigate these pitfalls, making vibe coding more reliable. However, we posit that integrating formal methods must transcend existing approaches that combine formal methods and LLMs. We advocate for a side-car system throughout the vibe coding process which: (1) \emph{Autoformalizes} specifications (2) Validates against targets, (3) Delivers \emph{actionable} feedback to the LLM, and (4) Allows intuitive developer influence on specifications.

large language model, natural language, specification, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3759425.3763390

2511.00202

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback